Protein Science — Latest Matching Preprints

1

Computational Redesign of an Antifreeze Protein Using Deep Learning

Calia, C.; Altunc, A. J.; Eufemio, R. J.; Alvarado, B. O.; Huynh, J. D.; Oh, E.; Burkart, M.; Meister, K.; Paesani, F.

2026-06-24 biophysics 10.64898/2026.06.21.733612 medRxiv

Top 0.1%

38.9%

Show abstract

Antifreeze proteins (AFPs) found in various cold-adapted organisms inhibit ice growth and are of interest for applications in food products, cryopreservation, agriculture, and materials science. Although high-resolution structures are available for several AFPs, the amino acids required for full antifreeze activity remain incompletely defined, and the development of AFP variants with properties such as enhanced solubility, high expression yield, and improved thermostability may further facilitate applications. Here, we used the deep learning model ProteinMPNN to redesign the globular fish antifreeze protein AFPIII, keeping the previously reported ice-binding residues fixed. We readily obtained sequences confidently predicted to adopt AFPIIIs structure and we selected five designed variants for expression, all of which expressed efficiently in E. coli. Circular dichroism spectroscopy showed that two of these variants retained secondary structure elements consistent with AFPIII, whereas the other three exhibited structural differences. One design was predicted and experimentally confirmed to have increased thermostability. All five variants displayed measurable thermal hysteresis activity. However, none reached the activity of wild-type AFPIII, suggesting that maintaining the currently established set of ice-binding residues is not sufficient to fully preserve this AFPs function; other, unidentified residues can also impact its activity. Our findings highlight the value of deep learning-based protein design methods both for generating AFP variants with desirable properties and for uncovering gaps in existing knowledge of well-characterized AFPs.

2

Prediction-Guided Design of a More Developable FGF21 Construct

Bozkurt, C.; Nathanail, E.; Goteti, A.

2026-07-14 bioengineering 10.64898/2026.07.13.738140 medRxiv

Top 0.1%

31.6%

Show abstract

For structural-biology and protein-production pipelines, the hardest part of a difficult protein is not the biology -- it is obtaining a well-behaved sample for functional studies. Programs routinely stall at construct design, expression, and purification: deciding where to truncate, which tags to use, how to express, and how to purify so the protein survives concentration and handling. These decisions are still made largely by literature precedent and experimental experience, and they require trial-and-error before arriving at a functional construct for hard targets. We present a prospective, single-pair wet-lab case study testing whether an integrated computational platform can improve these decisions. For human fibroblast growth factor 21 (FGF21) -- a clinically important and stability-challenged metabolic hormone -- we compared two expression constructs produced side by side under the same experimental workflow, using two different design strategies: one designed by a scientist from the literature (reproducing the published core-domain construct, PDB 6M6E), and one designed by the Orbion platform -- an AI, prediction-guided protein-design system (orbion.life) -- which additionally generated the expression and purification protocols (executed scientist-in-the-loop). The platforms construct used an unconventional, longer C-terminal boundary not found in public sequence databases. Since the two constructs differ in more than one feature, we treat them as workflow-level designs throughout. The scientist construct gave a higher initial yield ([~]2.4 xmore protein recovered at affinity capture). The platform-designed construct, however, showed a more favourable downstream developability profile: it concentrated higher (1.4 vs 0.7 mg/mL) while remaining more monodisperse by dynamic light scattering (DLS). The scientist construct, in contrast, aggregated on concentration, so its initial-yield advantage did not survive: in the final concentrated sample the Orbion construct provided the more usable material for downstream studies. Computed for the mammalian host used, the platform had prospectively scored its own design higher (composite 68.7 vs 59.0 for the scientist-designed construct), and its predictions of yield, solubility, and disorder matched the wet-lab outcome. This is a single, deliberately scoped case study, not a population-level benchmark; the two constructs differ in more than one feature, and biological activity was not assayed. Alongside the bottlenecks of this approach discussed here, used as a decision aid, prediction-guided construct and protocol design has the potential to remove costly iteration cycles of protein production campaigns.

3

Prosculpt: Lowering the Barrier to Computational Protein Design

Olivieri, F.;Konstantinova, A.;Ribnikar, N.;Bizjak, N.;Žnidar, ?.;Abel, K.;Rajh, E.;Ljubetič, A.

2026-06-26 Synthetic Biology 10.64898/2026.06.25.732351 medRxiv

Top 0.1%

31.3%

Show abstract

Over the past decade, protein design has evolved from a specialized discipline into a broadly accessible approach for engineering and interrogating biological systems. Despite these advances, protein design continues to be a technically challenging task, often requiring knowledge of programming to be able to use and combine the different software packages. To address this challenge, we have developed Prosculpt, an easy-to-use protein design pipeline. Prosculpt integrates RFdiffusion for backbone generation, ProteinMPNN for sequence design and multiple structure-prediction platforms (AF2, AF3, Colabfold, Boltz2). Candidate designs are evaluated using customizable Rosetta-based scoring protocols. Each project is specified through a single configuration file, enabling users with minimal computational expertise to perform sophisticated protein design tasks without writing code, while also allowing advanced users to access the full capabilities of the underlying programs. Prosculpt supports a wide range of applications, including design of symmetric homo-oligomers, design of binders, motif scaffolding, partial diffusion and fixed-backbone sequence redesign. By combining these capabilities within a single, user-friendly platform, Prosculpt provides a practical entry point to modern protein design for both novice and expert users.

4

Protein Surface Site Determines the Evolutionary Accessibility of Allosteric Regulation

Dinan, J. C.; McCormick, J. W.; Soni, R.; Thompson, S.; Reynolds, K. A.

2026-07-03 biophysics 10.64898/2026.07.02.735819 medRxiv

Top 0.1%

26.9%

Show abstract

Domain recombination is a major source of new allosteric regulation in both evolved and engineered proteins. However, the sequence and structural features that govern where new allostery may emerge remain poorly understood. Here, we test the hypothesis that the evolutionary accessibility of allosteric regulation following domain insertion is constrained by local surface context, specifically association with pre-existing cooperative networks known as protein sectors. We began with two synthetic domain fusions wherein the Avena sativa light-oxygen-voltage (LOV2) domain was inserted into Escherichia coli dihydrofolate reductase (DHFR) at either a sector connected or non-sector connected surface. The insertion sites are only separated by five residues and both DHFR enzymes retain similar catalytic activity, yet the sector connected version exhibits a light-dependent allosteric phenotype, while the non-sector connected version does not. Using deep mutational scanning, we measured the effect of nearly all single point mutations on allostery in each chimera. The sector-connected DL121 was significantly more evolvable, possessing numerous allostery-tuning single mutants. In contrast, DL116 lacked statistically significant mutants that introduce allosteric regulation, suggesting the protein surface used by DL116 may be an evolutionary "dead end" for a regulatory phenotype. Surprisingly, DL116 did not show cooperative unfolding at temperatures up to 80 {degrees}C, suggesting that enhanced protein stability does not promote the evolvability of allosteric regulation as it does with other phenotypes. Together, our findings show that protein surface context influences the mutational pathways available for allosteric regulation, consistent with the view that sector-connected surface sites harbor a latent capacity for allostery while other locations are more evolutionarily inert.

5

AlphaFlex: Ensembles of the human proteome representing disordered regions

Liu, Z. H.; Zhang, O.; De Castro, S.; Sun, K.; Ghafouri, H.; Attafi, O. A.; Fawzi, N. L.; Tosatto, S. C. E.; Monzon, A. M.; Moses, A. M.; Head-Gordon, T.; Forman-Kay, J. D.

2026-06-23 biochemistry 10.1101/2025.11.24.690279 medRxiv

Top 0.1%

21.9%

Show abstract

More than two thirds of proteins in the human proteome are predicted to contain intrinsically disordered regions (IDRs), which lack stable folded structure. IDRs are critical for biological regulation and organization, as targets for post-translational modifications, and as mediators of biomolecular condensates. To address the pressing need for better structural models enabling functional insight, we developed AlphaFlex to model fully atomistic conformer ensembles for proteins predicted to have IDRs, modeled in the context of AlphaFold folded domains and an implicit bilayer for transmembrane proteins. The AlphaFlex resource provides conformational ensembles of human proteins from the AlphaFold database with identified IDRs in the Protein Ensemble Database that is mirrored in UniProt. This transformative resource of AlphaFlex ensembles provides physically and biologically relevant full-length models for IDR proteins, including scaffold proteins, those with IDR:folded-domain interactions, regulatory and condensate proteins requiring exposed binding elements, conditionally folding IDRs, and transmembrane proteins containing IDRs.

6

Benchmarking AI Protein Structure Predictors Reveals a Persistent Bias in Multi-State Proteins

Ye, M.; Wang, Y.-H.; Brogi, M.; Parks, J. M.; Kuo, K. M.; Gumbart, J. C.

2026-07-11 biophysics 10.64898/2026.07.10.737860 medRxiv

Top 0.1%

18.5%

Show abstract

Protein structure predictors achieve high single-state accuracy, but it remains unclear whether they can recover functionally relevant conformational ensembles or account for the presence of ligands and/or binding partners. Here, we benchmark AlphaFold3, Boltz-2, Chai-1, and BioEmu on four canonical multi-state proteins (Pf-MATE, LAO, SecA, and {beta}2AR), quantifying state bias and sampling breadth against experimental reference structures. Models frequently default to a dominant state represented in the PDB; small-molecule ligands have weak or inconsistent effects, while large protein partners drive clear conformational switching between states. Multiple sequence alignment (MSA)-based approaches (AF-Cluster and random subsampling) recapitulate similar biases, indicating that this behavior is not unique to newer architectures. These results underscore current limitations for multi-state protein structure prediction and structure-guided ligand discovery. TOC Graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=111 SRC="FIGDIR/small/737860v1_ufig1.gif" ALT="Figure 1"> View larger version (12K): org.highwire.dtl.DTLVardef@3bf389org.highwire.dtl.DTLVardef@1f1c436org.highwire.dtl.DTLVardef@188ea8aorg.highwire.dtl.DTLVardef@1de236e_HPS_FORMAT_FIGEXP M_FIG C_FIG

7

Dual Carbohydrate Recognition by the Chitinase-like Protein CHI3L1 Through Distinct Glycosaminoglycan and Chitin-Binding Interfaces

Kurc, O.; Rähse, N.; Gopalswamy, M.; Grossdorf, A.; Gorzelanny, C.; Cramer, J.; Gohlke, H.

2026-06-28 biophysics 10.64898/2026.06.23.733983 medRxiv

Top 0.1%

14.5%

Show abstract

CHI3L1 (YKL-40) is a chitinase-like glycoprotein involved in immune regulation, tissue remodeling, and cancer, yet the molecular principles governing its glycan interactions remain incompletely defined. Previous reports suggested that CHI3L1 can bind to chitin oligosaccharides (COS) and glycosaminoglycan (GAG) ligands, however, the molecular basis and binding sites underlying these interactions remain controversial. Here, a combination of biophysical and computational methods is employed to shed light on carbohydrate interactions of the protein and delineate a potential crosstalk between its glycan-binding interfaces. Our results demonstrate that COS and GAGs bind to distinct, non-overlapping sites on CHI3L1. Both ligand classes exhibit a strong dependence of binding affinity on the degree of polymerization. Molecular dynamics simulations, supported by mutational analysis, identify a GAG-binding site centered on residues R144, R145, and K147 and reveal an additional distal interaction site for longer GAG ligands. Biophysical and biochemical assays fail to confirm a previously proposed allo- or orthosteric interaction between both binding sites. However, physiologically relevant protein-protein interactions mediated by the chitin binding site of CHI3L1 are differentially regulated by GAG and COS ligands. COS inhibit binding of galectin-3 to CHI3L1, whereas GAG ligands enhance the affinity between the proteins by ca. 14-fold. Together, these findings establish CHI3L1 as a dual carbohydrate-binding protein with distinct recognition interfaces and reveal a previously unrecognized role for GAGs in modulating CHI3L1-mediated signaling interactions.

8

zsasa: a Zig-based engine for high-throughput solvent accessible surface area at proteome scale

Nagae, T.; Tomii, K.

2026-07-03 bioinformatics 10.64898/2026.06.29.733683 medRxiv

Top 0.1%

13.4%

Show abstract

Solvent accessible surface area (SASA) is widely used to describe protein stability, ligand binding, mutation effects, and protein-protein interfaces. As structural biology workloads expand to predicted-structure collections, trajectories, and large assemblies, SASA tools must combine reproducible calculation with high throughput, low memory use, and workflow-friendly input handling. We present zsasa, a Zig-based SASA engine with command-line and Python interfaces. zsasa implements the established Shrake-Rupley and Lee-Richards algorithms, provides exact f64/f32 modes and an optional bitmask approximation, and supports batch and trajectory workflows, compressed structure inputs, and configurable atom classification including Chemical Component Dictionary (CCD)-based radii for non-standard components. In matched Shrake-Rupley validation on 4,370 Escherichia coli AlphaFold Database structures, exact double-precision zsasa reproduced FreeSASA total SASA values to near numerical identity. In 10-thread batch benchmarks on the E. coli and 23,586-structure human AlphaFold collections, zsasa was 2.94x faster than a FreeSASA batch wrapper in exact f64 mode and up to 9.70x faster in bitmask mode, with roughly 4-8x lower peak memory. Trajectory benchmarks exceeded 1,000 frames/s at tens of megabytes of peak memory, and a 4.5-million-atom PDB stress-test file completed in under five seconds. These results support zsasa as a practical tool for reproducible, low-memory generation of surface-derived structural features at large scale. zsasa is available under the MIT License at https://github.com/N283T/zsasa.

9

The Hidden Disorder Divide: Reconciling Benchmark Inconsistencies in Intrinsically Disordered Protein Binding Site Prediction

Malhis, N.; Mehdiabadi, M.; Erdos, G.; Gsponer, J.; Kurgan, L.; Tosatto, S. C. E.; Dosztanyi, Z.; Piovesan, D.

2026-06-27 bioinformatics 10.64898/2026.06.24.733783 medRxiv

Top 0.2%

12.8%

Show abstract

Computational predictors of protein-binding sites within intrinsically disordered regions (IDRs) show highly inconsistent performance across high-quality benchmark datasets. To understand the origins of these discrepancies, we systematically compared predictors across three independent test sets: two CAID datasets updated with the latest DisProt annotations and a composite dataset (DBs) assembled from DIBS, FuzDB, IDEAL, and MFIB. Predictors trained predominantly on DisProt data achieved substantially higher AUCs on the CAID sets but performed poorly on the DBs. In contrast, predictors trained on older, low-quality PDB-based datasets showed balanced performance across all sets, with a slight preference for DBs. Predictors with mixed training exposure displayed intermediate behavior. Through controlled experiments using identical CNN architectures and feature analysis, we demonstrate that the dominant factor driving these performance differences is the intrinsic disorder propensity of the binding sites themselves. Binding residues in DisProt-based datasets exhibit markedly higher average disorder propensity scores than those in PDB-derived datasets. This previously unrecognized selection bias -- literature studies preferentially characterizing more disordered binding sites, while PDB-derived annotations capture less disordered ones -- effectively splits IDR-protein binding sites into two distinct categories. Predictors optimized on one category therefore generalize poorly to the other. Binding-site length and sequence conservation play only minor or negligible roles in explaining the observed inconsistencies. These findings highlight a critical limitation in current benchmarking practices and training strategies for IDR-binding site prediction, underscoring the need for more balanced and disorder-aware reference datasets. Finally, the diagnostic techniques introduced here could prove valuable beyond the specific application examined in this study.

10

Structural Determinants of Catalytic Directionality in an AMP-Forming Acetyl-CoA Synthetase from Syntrophus aciditrophicus

Yaghoubi, S.; Dinh, D. M.; Thomas, L. M.; Wofford, N. Q.; McInerney, M. J.; Follmer, A. H.; Karr, E. A.

2026-07-07 biochemistry 10.64898/2026.07.06.736832 medRxiv

Top 0.2%

12.7%

Show abstract

Acetyl-coenzyme A (CoA) is a central metabolic intermediate that links carbon and energy metabolism across all domains of life. The conversion of acetate and acetyl-CoA is carried out by three enzyme pathways: acetate kinase/phosphotransacetylase, ADP-forming acetyl-CoA synthetase, and AMP-forming acetyl-CoA synthetase (Acs). Acs enzymes serve critical physiological roles across diverse organisms generally by catalyzing a reversible two-step reaction forming acetyl-CoA and AMP from acetate and ATP. Isolated from the wastewater reclamation facility in Norman, Oklahoma, Syntrophus aciditrophicus strain SB (Sa) relies on an AMP-forming acetyl-CoA synthetase (SaAcs1) that favors synthesizing acetate and ATP from acetyl-CoA and AMP, in contrast to all previously characterized Acs enzymes. The origin of this preference and the structural determinants of both the thioester-forming step and catalytic directionality remain poorly understood. Here, we report a 2.2 [A] crystal structure of full-length SaAcs1 in the adenylation conformation with acetyl-AMP bound in the active site. Structural comparison to the extensively characterized Acs enzymes from Salmonella enterica (SeAcs) and Cryptococcus neoformans (CnAcs) revealed a displaced CoA-binding loop in SaAcs1. Enzymatic assays confirmed that SaAcs1 preferentially catalyzes the ATP-forming reaction. Site-directed mutagenesis demonstrated that reversion of two residues, G196 and T197, at the beginning of the CoA-binding loop to the consensus sequence repositions the loop and shifts catalytic preference toward the AMP-forming direction. Together, these results establish the CoA-binding loop and G196 and T197 as the primary structural determinants of directional preference in SaAcs1.

11

Hydration and H/D exchange-dependent infrared signatures of the GCN4 leucine zipper

Bhuvanendran, H.; Brunner, C. M.; Kempf, H.; Moro, J. L.; Roubieu, E.; Turbant, F.; Mateus, A.; Lin, H.; Das, L.; Malyshev, D.; Johns, B.; Parracino, A.; Pastore, A.; Peters, J.; Cortajarena, A. L.; Zanetti Polzi, L.; Maccaferri, N.

2026-06-26 biochemistry 10.64898/2026.06.26.731617 medRxiv

Top 0.2%

12.6%

Show abstract

Attenuated total reflectance Fourier-transform infrared (ATR-FTIR) spectroscopy of proteins in aqueous solution is often limited by water absorption and other optical artifacts. To overcome these limitations, we evaluated the structural features and hydrogen-deuterium exchange (HDX) kinetics of the -helical protein GCN4 in both hydrated (wet) and vacuum-dried (dry) states. While solvent heavily mask the second-derivative spectra of wet samples, vacuum drying yielded a thin, protein-rich film on the ATR crystal, significantly enhancing the signal-to-noise ratio and resolving the protein features without altering the native structure. Dry-state analysis clearly resolved the Amide I, Amide II, and deuterium-shifted Amide II' (1450 cm-1) bands. Notably, second-derivative analysis of the dry spectra of the HDX samples revealed a bimodal Amide I distribution consisting of a stationary band at 1653 cm-1 from the solvent-inaccessible regions and an isotopically sensitive band shifting from 1648 cm-1 to 1644 cm-1 from solvent-accessible regions. These results demonstrate that vacuum-dried ATR-FTIR spectroscopy effectively eliminates solvent masking, providing the spectral clarity required to resolve discrete -helical sub-populations after deuteration.

12

Location dependence of protein intrinsic disorder in Drosophila melanogaster

Abdulla Daanaa, H. S.; Kuraku, S.; Akashi, H.; Saito, K.

2026-07-03 bioinformatics 10.64898/2026.07.02.732782 medRxiv

Top 0.2%

12.5%

Show abstract

The relevance of protein structural flexibility in function remains contested, but experimental and computational evidence continues to accumulate. Many efforts to address this investigate intrinsic disorder, which commonly refers to peptide segments or entire protein sequences that presumably lack structure and exhibit high flexibility/conformational heterogeneity under physiological conditions. These efforts face challenges such as conflicting computational predictions and ambiguous relationships among intrinsic disorder locations and other protein properties. We address these challenges at a genome-wide scale in Drosophila melanogaster using residue-level predictions for various protein properties. We employ single and consensus approaches to quantify the prevalence of intrinsic disorder and attempt to infer function by testing for differences along protein sequences. Intrinsic disorder is likely more common at terminals than internal regions, and amino acid frequencies can vary substantially between regions in a manner that plausibly reflects functions of intrinsic disorder, rather than only proteome-wide effects. Tertiary structure potentially underlies the prevalence of intrinsic disorder along protein sequences; this prevalence varies more in a putatively solvent-exposed context than a solvent-buried one. Protein-binding appears to be a main function of intrinsic disorder, and we find support consistent with the notion that structural flexibility fosters binding plasticity, and show that location and protein length are factors in this relationship. Nucleic acid-binding and linker are ostensibly less common disorder functions than protein-binding, but nucleic acid-binding seems more localized at terminals. Residue-level estimates of selection pressure indicate that disordered regions generally evolve under weaker sequence constraints than structured regions, except at the N-terminal region. Biases in disorder prediction are a considerable factor in many of the observations, but unlikely a full explanation. The findings strengthen support for functional relevance of flexibility, offer insight into protein architecture and function, and lend impetus for experimental inquiry.

13

Structural and Energetic Determinants of Monobody Recognition of Oncogenic KRAS Variants

Kumar, A.; Huang, Y.-m. M.

2026-07-10 biochemistry 10.64898/2026.07.09.737552 medRxiv

Top 0.2%

12.5%

Show abstract

Monobodies are engineered binding proteins that recognize extended protein surfaces and offer advantages over small-molecule inhibitors for targeting challenging KRAS oncoproteins. Monobody 12D4 exhibits high affinity and selectivity for the oncogenic KRAS(G12D) mutant, but the molecular determinants governing its recognition and the basis for its mutant selectivity remain poorly understood. Here, we combined molecular dynamics simulations and energy calculations to characterize the interactions between monobody 12D4 and WT KRAS as well as four clinically relevant oncogenic variants (G12C, G12D, G12V, and G12R) in both GTP- and GDP-bound states. Our simulations revealed that 12D4 recognition depends on a conserved hydrophobic interaction network centered on the monobody FG loop (residues L77, F78, and W79). This network forms stable contacts with KARS Switch II and 3-helix. The energy calculations also showed that residue K75 of 12D4 formed a mutation-specific electrostatic interaction with KRAS G12D. This interaction contributed significantly to the affinity of 12D4 toward this mutant, whereas this interaction was absent in other variants. No monobody currently exists for targeting KRAS G12R in either nucleotide state, and no monobody selectively targets KRAS G12C and G12V in the GDP-bound inactive state. To address these, we performed computational redesign at residues 75. We identified mutations (K75Q, K75Y, and K75M) that enhanced predicted binding to G12C, G12R, and G12V variants through reorganization of interfacial contacts. Our work establishes a structural framework for understanding KRAS-monobody recognition and provides a rational foundation for engineering variant-selective monobodies with improved affinity toward previously untargetable KRAS mutants.

14

Comparative Modelling of Actin-Tropomyosin Interfaces

Menon, R.; BALASUBRAMANIAN, M.; Sowdhamini, R.

2026-07-10 bioinformatics 10.64898/2026.07.06.736648 medRxiv

Top 0.2%

11.8%

Show abstract

Tropomyosins are coiled-coil dimers that polymerize head-to-tail along actin filaments. They stabilize distinct filament populations and regulate the access of myosins and actin-binding proteins in both muscle and non-muscle contexts. Despite their central regulatory role, how filament length and isoform identity of different tropomyosin homologues might modulate actin affinity is not completely understood, especially across species. Here, we present a stepwise computational docking pipeline combining AlphaFold2-Multimer coiled-coil models, experimentally informed residue-level restraints, and pseudo-energy analysis via PPCheck to build and evaluate actin-tropomyosin co-polymer models for three isoforms: human TPM1 (hTPM1; 284 residues), human TPM4 (hTPM4; 248 residues), and Schizosaccharomyces pombe Cdc8 (SpCdc8; 161 residues). Interface energetics reveal a consistent hierarchy in which the shortest filament, SpCdc8, achieves the most stabilizing and residue-rich actin contacts, consistent with reduced cumulative geometric penalty along the actin helix. Among human isoforms, hTPM1 forms stronger interfaces with actin than hTPM4. The hTPM1-actin model also exhibits higher contact density and additional energetic hotspots, in agreement with the experimentally established slower exchange kinetics of TPM1 isoforms on actin filaments relative to TPM4. Hotspot mapping identifies conserved acidic residues at equivalent positions across all three isoforms, emphasizing the importance of electrostatic anchor points in maintaining interface integrity across diverse evolutionary contexts. Modeling of four temperature-sensitive SpCdc8 mutations (A18T, R21H, E31K and E129K) reveals that these substitutions substantially destabilize the coiled-coil dimer without significantly affecting actin interactions, suggesting that subtle regulatory failure arises from compromised longitudinal cable continuity rather than from direct loss of actin affinity. Taken together, our results support a hierarchical model of tropomyosin dimer stability, actin-tropomyosin recognition in which filament length imposes a geometric baseline on interface stability, onto which isoform-specific sequence evolution superimposes functional tuning. The tropomyosin homologues we studied appear to retain conserved electrostatic hotspots thereby providing a common structural scaffold across tissues and organisms.

15

Structural Bioinformatics of Four Human Aquaporins and Their Water-Soluble QTY Analogs

Zhang, S.; Xiao, E.

2026-06-30 bioinformatics 10.64898/2026.06.24.734367 medRxiv

Top 0.2%

11.8%

Show abstract

Human aquaporins (AQPs) are essential membrane channels, yet their inherent hydrophobicity complicates structural and functional studies. We present the systematic application of the QTY code to human AQPs, integrating it with AlphaFold 3 structure prediction to design and validate that four-representative human AQPs (AQP1, AQP3, AQP4, AQP7) can be converted into water-soluble analogs while maintaining their conformation. This approach features a novel platform for editing challenging membrane proteins. The QTY code was applied to the transmembrane regions of the selected four AQPs. Subsequently, the water-soluble QTY analogs of the four AQPs were predicted using AlphaFold 3. The predicted structures were superposed with CyroEM- or X-ray-determined native structures in PyMOL. Further analyses included root-mean-square deviation (RMSD) calculations, visualization of hydrophobic surface reduction, and inspection of conserved protein-ligand binding ability. After applying the QTY code, sequence changes between native AQPs and their QTY analogs was significant (42.86-48.80%). Nevertheless, their structures superposed well in analyses, with only slight deviations (RMSD < 0.6 [A]). In addition, the surface hydrophobicity of all QTY-edited AQPs was significantly reduced. Importantly, molecular contacts between the cholesterol ligand and protein were largely preserved for both native AQP1 and its QTY analog. Finally, all AlphaFold3-predicted structures for AQPs have high confidence values (pLDDT > 90; pTM ~0.83), supporting the reliability of the predicted structures. The findings demonstrate that membrane protein hydrophobicity can be edited and reduced without compromising fold integrity or functional architecture. Integration of the QTY code with AlphaFold 3 affords a high-throughput platform for designing water-soluble, structurally faithful analogs of challenging membrane proteins. Such a strategy can provide a potent platform for detergent-free biochemical studies and water-soluble analogs for therapeutic monoclonal antibody discoveries, thus advancing research of this pharmacologically important protein family.

16

Structural and Biochemical Analysis of the CABIT1 Domain of THEMIS

Negron Teron, K. I.; Ortiz-Salazar, D.; Beyett, T. S.

2026-06-25 biochemistry 10.64898/2026.06.24.734275 medRxiv

Top 0.2%

11.7%

Show abstract

T cells are important components of the adaptive immune system and develop through a selection process regulated by signaling through the T-cell receptor (TCR). Thymocyte-Expressed Molecule Expressed in Selection (THEMIS) is a TCR-proximal protein that modulates the activity of Shp1 phosphatase to influence TCR signaling during development. THEMIS has been shown to both activate and inhibit Shp1, but the molecular mechanisms of these functions are poorly understood. THEMIS contains two rare Cysteine All-Beta In THEMIS (CABIT) domains, the N-terminal of which interacts with Shp1 and is likely responsible for modulation of its phosphatase activity. Herein, we report the first crystal structure of the THEMIS CABIT1 domain. While a portion of the CABIT1 domain is poorly resolved, it appears to share the same overall fold observed in our recent CABIT2 crystal structure and AlphaFold predictions. We show that phosphorylation of the CABIT1 domain by LCK is required for association with SHP1 and that phosphorylated CABIT1 can protect Shp1 from oxidation and inhibition by reactive oxygen species (ROS), which may serve as a mechanism by which THEMIS enhances Shp1 activity.

17

BioMetAll v2.0: Introducing Scores, Metal Discrimination, and Side-Chain Descriptors for Predicting Metal-Binding Sites in Proteins.

Marechal, J. D.; Fernandez Diaz, R.; Pena Losada, R.; Sanchez Aparicio, J. E.; Gao, W.; Alemany, M.

2026-07-12 bioinformatics 10.64898/2026.07.09.737562 medRxiv

Top 0.2%

11.6%

Show abstract

Predicting the location of metal-binding sites in proteins is crucial for fundamental biological questions and biotechnological applications. Over the past decade, the rise in metal-bound protein structures in the Protein Data Bank, combined with advanced statistical models such as deep learning, has accelerated the development of metal-binding site prediction tools. Several approaches are now available, offering high-quality benchmarks and predictive performance. Our initial development in this area is BioMetAll, whose first version was based on backbone pre-organization. Here, we introduce its second version, featuring two major updates: 1) metal-specific scoring functions and 2) prediction using backbone geometry alone or in combination with first coordination sphere descriptors. Apart from demonstrating metal sensitivity and yielding better benchmarking results, this new version allows the assessment of the influence of considering the metals first coordination sphere versus backbone pre-organization on how metallic species bind to proteins.

18

ComplexDesign: sequence-hallucination design of protein binders bridging multiple proteins

Xu, J.; Ren, M.; Qi, N.; Zhang, X.; He, Z.; Yu, C.; Bu, D.

2026-06-24 bioinformatics 10.64898/2026.06.21.733655 medRxiv

Top 0.3%

10.8%

Show abstract

MotivationDesigning multichain protein complexes requires coordinating the folding of component proteins with the formation of their interfaces. The existing methods, however, remain limited in their ability to satisfy these requirements simultaneously, especially for trimeric and tetrameric complexes. As an important practical scenario, designing a binder that bridges two target proteins into a ternary complex requires flexibility in the relative arrangement of the two targets, adding an additional challenge to existing design methods. ResultsWe present ComplexDesign, a hallucination-based approach for multichain protein design. ComplexDesign performs structure-prediction-guided sequence optimization to simultaneously fold each protein chain and form inter-chain interactions that bind them together. To provide the flexibility required to appropriately arrange these target proteins, ComplexDesign introduces a specialized masking mechanism that enables exploration of possible relative arrangements rather than being limited to the predefined ones. Across a comprehensive set of benchmarks with various chain lengths, ComplexDesign outperformed existing methods in the unconditional design of dimers, trimers, and tetramers, achieving a high design success rate exceeding 50%, supporting its capability for multichain complex design. Furthermore, in the case of multi-target binder design, ComplexDesign produced high-confidence, self-consistent ternary complexes for 8 out of 10 target pairs. These results establish ComplexDesign as an effective tool for multichain protein design, with particular utility for designing binders that bridge two target proteins. Availability and implementationThe source code of ComplexDesign will be made publicly available upon publication.

19

Challenges in reconstituting the peroxiredoxin 2:STAT3 transient redox-relay complex in vitro

Malo Pueyo, J.; Baranova, E.; Wahni, K.; Dubach, V. R. A.; Janvier, S.; Vertommen, D.; Murphy, B. J.; Ezerina, D.; Messens, J.

2026-07-10 molecular biology 10.64898/2026.07.03.731362 medRxiv

Top 0.3%

10.1%

Show abstract

Peroxiredoxin 2 (Prdx2) mediates redox signaling by transferring oxidative equivalents to target proteins such as STAT3, a redox-sensitive transcription factor implicated in inflammation and cancer. Although this interaction has been demonstrated in cells, reconstituting the Prdx2:STAT3 complex in vitro remains challenging due to its transient and redox-dependent nature. Here we test various conditions to stabilize the complex between taggless Prdx2 and the core fragment of STAT3 (CF-STAT3), including oxidants, detergents, the facilitator Annexin A2, anaerobic environments, and CovalX crosslinking. Complex formation was assessed via mass photometry, analytical size-exclusion chromatography (SEC), SEC-MALS, and electron microscopy (EM). No stable complex was observed under standard conditions. Anaerobic environments briefly stabilized the interaction, but cryo-EM could not resolve the structure. CovalX crosslinking yielded short-lived but homogeneous complexes. We found that Prdx2 is highly susceptible to hyperoxidation at its peroxidatic cysteine, particularly in the presence of DTT or excess H2O2, resulting in loss of function. Maintaining non-reducing conditions during purification preserved Prdx2 in an oxidation-competent state, promoting formation of the disulfide bond between the peroxidatic and resolving cysteines and thereby enabling reproducible detection of a weak complex with CF-STAT3. Our findings establish a framework for studying redox-relay protein complexes in vitro and highlight the importance of oxidation state management during protein handling.

20

Prot2Prop: Structure-informed multitask protein property prediction

Gharaie Amirabadi, D.; Jackson, C.; Kim, D. S.; Sprang, M.; Amani, K.

2026-06-29 bioinformatics 10.64898/2026.06.28.735009 medRxiv

Top 0.3%

9.8%

Show abstract

Protein engineering often relies on separate models for related developability properties, limiting efficiency and transfer across tasks. We present Prot2Prop, a multitask framework based on a frozen ProstT5 encoder with shared and task-specific adapters for joint prediction of six protein properties: material production, solubility, temperature stability, aggregation propensity, expression yield, and folding stability. Across held-out test data, Prot2Prop achieved strong performance on both classification and regression tasks, including AUROC values ranging from 0.86 to 0.98 for classification endpoints and Spearman correlations ranging from 0.73 to 0.86 for regression endpoints. The model achieved particularly strong performance for temperature stability (AUROC = 0.98) and aggregation propensity (Spearman = 0.86). Post-hoc calibration further improved regression accuracy, reducing folding stability MAE from 0.67 to 0.48. These results demonstrate that parameter-efficient multitask adaptation of protein language models can provide accurate and unified prediction of diverse protein developability properties. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=132 SRC="FIGDIR/small/735009v1_ufig1.gif" ALT="Figure 1"> View larger version (51K): org.highwire.dtl.DTLVardef@d0eea6org.highwire.dtl.DTLVardef@e3f482org.highwire.dtl.DTLVardef@1c98656org.highwire.dtl.DTLVardef@192a93b_HPS_FORMAT_FIGEXP M_FIG C_FIG